AITopics | speaker detection and speech enhancement

Collaborating Authors

speaker detection and speech enhancement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

#artificialintelligenceMar-7-2022, 23:00:33 GMT

Such audio-visual event not only plays a critical role for human perception in our social life, but also is involved in diverse human-computer interaction scenarios, e.g., multi-modal robot dialogue system or in-vehicle AI navigation system. As shown in Fig.1, when driving an autonomous vehicle, we can easily do some interactive operations with the intelligent driver assistance system, which is privately designated by the driver. But in many cases, the noises coming from the rear may become a kind of interference signal that affects such a human-computer interaction process, and frequently influence the intelligent assistant from accurately extracting the driver's instructions and responding accordingly. Therefore, the current limitations in audio-visual interactions can be highlighted as follows for more effective solution investigation: 1) Identify the voice of the target speaker in the mixed audio signals, and it must not be disturbed by interruptions from other speakers; 2) Perform speech enhancement to the target speaker's voice while ignoring the background noises, and extracting the target speaker's command; 3) How should the intelligent assistant accurately recognize the speech of the target when a new candidate who has not pre-registered the voice information in advance appears.

active speaker detection, multi-modal correlation learning, speaker detection and speech enhancement, (3 more...)

#artificialintelligence

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Robots (0.63)

Add feedback